Efficient Matching of Substrings in Uncertain Sequences
نویسندگان
چکیده
Substring matching is fundamental to data mining methods for sequential data. It involves checking the existence of a short subsequence within a longer sequence, ensuring no gaps within a match. Whilst a large amount of existing work has focused on substring matching and mining techniques for certain sequences, there are only a few results for uncertain sequences. Uncertain sequences provide powerful representations for modelling sequence behavioural characteristics in emerging domains, such as bioinformatics, sensor streams and trajectory analysis. In this paper, we focus on the core problem of computing substring matching probability in uncertain sequences and propose an efficient dynamic programming algorithm for this task. We demonstrate our approach is both competitive theoretically, as well as effective and scalable experimentally. Our results contribute towards a foundation for adapting classic sequence mining methods to deal with uncertain data.
منابع مشابه
Efficient algorithms for the longest common subsequence in $k$-length substrings
Finding the longest common subsequence in k-length substrings (LCSk) is a recently proposed problem motivated by computational biology. This is a generalization of the well-known LCS problem in which matching symbols from two sequences A and B are replaced with matching non-overlapping substrings of length k from A and B. We propose several algorithms for LCSk, being non-trivial incarnations of...
متن کاملON THE MATCHING NUMBER OF AN UNCERTAIN GRAPH
Uncertain graphs are employed to describe graph models with indeterministicinformation that produced by human beings. This paper aims to study themaximum matching problem in uncertain graphs.The number of edges of a maximum matching in a graph is called matching numberof the graph. Due to the existence of uncertain edges, the matching number of an uncertain graph is essentially an uncertain var...
متن کاملThe Vmatch large scale sequence analysis software
Usually, in a large scale matching problem, extensive portions of the sequences under consideration are static, i.e. they do not change much over time. Therefore it makes sense to preprocess this static data to extract information from it and to store this in a structured manner, allowing efficient searches. Vmatch does exactly this: it preprocesses a set of sequences into an index structure. T...
متن کاملAn Efficient Adaptive Boundary Matching Algorithm for Video Error Concealment
Sending compressed video data in error-prone environments (like the Internet and wireless networks) might cause data degradation. Error concealment techniques try to conceal the received data in the decoder side. In this paper, an adaptive boundary matching algorithm is presented for recovering the damaged motion vectors (MVs). This algorithm uses an outer boundary matching or directional tempo...
متن کاملOn the Linear Number of Matching Substrings
We study the number of matching substrings in the pattern matching problem. In general, there can be a quadratic number of matching substrings in the size of a given text. The linearizing restriction enables to find at most a linear number of matching substrings. We first explore two well-known linearizing restriction rules, the longest-match rule and the shortest-match substring search rule, a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014